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We estimate the quantum state of a light beam from results of 
quantum homodyne measurements performed on identically prepared 
quantum systems. The state is represented through the Wigner func- 
tion, a generalized probability density on R 2 which may take negative 
values and must respect intrinsic positivity constraints imposed by 
quantum physics. The effect of the losses due to detection inefficien- 
cies, which are always present in a real experiment, is the addition 
to the tomographic data of independent Gaussian noise. 

We construct a kernel estimator for the Wigner function, prove 
that it is minimax efficient for the pointwise risk over a class of in- 
finitely differentiable functions, and implement it for numerical re- 
sults. We construct adaptive estimators, that is, which do not depend 
on the smoothness parameters, and prove that in some setups they 
attain the minimax rates for the corresponding smoothness class. 

1. Introduction. In 1932 Wigner published a seminal paper [30] in which 
he introduced a fundamental tool for quantum mechanics known these days 
as the Wigner function. Glauber extended such techniques to quantum optics 
where phase space representations of quantum states play an important role 
in detecting quantum effects in light [7, 13]. 

Quantum homodyne tomography (QHT) is a technique for reconstructing 
the state of a quantum system from measurement data. It was theoretically 
proposed in [29] and put in practice for the first time by Smithey et al. [26]. 
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This method allows quantum opticians to visualize the Wigner function of 
newly created states of light and verify whether the theoretical predictions 
agree with the statistical findings. We mention a few experiments such as 
the creation of squeezed states [5] and of single-photon-added coherent states 
[31]. 

Various aspects of the corresponding ill-posed inverse problem have been 
analyzed in [9, 23] and [22], and different estimation methods have been 
proposed by Banaszek et al. [3] and Lvovsky [24]. For an overview of the QHT 
problem in quantum optics we refer to [21] and for more recent developments 
to [25]. 

This paper addresses the statistical problem of estimating the Wigner 
function of a beam of light from results of QHT measurements on indepen- 
dent, identically prepared beams. 

One way to think about quantum tomography as a statistical problem 
is as follows: the unknown parameter is a joint density W of two variables, 
Q and P. We observe the random variable (X, $) = (cos(<J>)Q + sin($)P, <3?) 
where $ is chosen independently of (Q,P), and uniformly in the interval 
[0, 7r]. The joint density of (X, $) can be expressed mathematically in terms 
of the joint density W of (Q, P), which is allowed to take negative as well as 
positive values, subject to certain restrictions which guarantee that (X, <£) 
does have a proper probability density. In an ideal situation W would be 
a density function and then the statistical problem would be to estimate 
W from independent samples of (X, <£). In the context of positron emission 
tomography this problem has been addressed in [8] , which provides minimax 
rates for the pointwise risk on a class of "very smooth" probability densities. 
The quantum tomography version where W is a proper Wigner function is 
treated along similar lines in [16] with the important difference that the 
proof of the lower bound requires the construction of a "worst parametric 
family" of Wigner functions rather than probability densities. 

In this paper we consider a statistical problem which is more relevant 
for the experimentalist confronted with various noise sources corrupting the 
ideal data (X, $). It turns out that a good model for a realistic quantum 
tomography measurement amounts to replacing (X, by the noisy obser- 
vations (Y, $), where Y := y/rjX + — ??)/2£, with £ a standard Gaussian 
random variable independent of (X, <!>). The parameter < r] < 1 is called the 
detection efficiency and represents the proportion of photons which are not 
detected due to the losses in the measurement process. This is the statistical 
problem of this paper, a combination of two classical problems: noise decon- 
volution and PET tomography. The nonclassical feature is that although 
all the one-dimensional projections of W are indeed bona fide probability 
densities, the underlying two-dimensional "joint density" need not itself be 
a bona fide joint probability density, but can have small patches of "negative 
probability." 
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So far there has been little attention paid to this problem by statisticians, 
although on the one hand it is an important statistical problem coming up in 
modern physics, and on the other hand it is "just" a classical nonpar ametric 
statistical inverse problem. A first step in the direction of estimating p has 
been made in [2] , where consistency results are presented for linear and sieve 
maximum likelihood estimators. We recommend this paper as a complement 
to the present one. 

Section 2 starts with a short introduction to quantum mechanics followed 
by the particular problem of estimating the Wigner function in quantum 
homodyne tomography. In Section 2.3 we describe some features of Wigner 
functions and show to what extent these functions differ from probability 
densities on the plane. The section ends with a description of the experimen- 
tal set-up and the derivation of the Gaussian noise from physical principles. 

Section 3 contains the main results of this paper. We assume that the 
unknown Wigner function belongs to a class A(f3, r, L) of "very smooth" 
functions similar to those of [6, 8] and [16]. The estimator has a standard 
kernel-type form performing in one step the deconvolution and the inverse 
Radon transform. In Proposition 1 we compute upper bounds for the point- 
wise risk. Theorem 1 establishes the lower bound and gives the minimax 
rate, which is slower than any power of 1/n but faster than any power of 
1/logn. Rates with a similar behavior have been obtained in [6], which in- 
spired some of the results obtained in this paper. Adaptive estimators can 
be derived in some cases (when r < 1) (see Theorem 2), converging at the 
same rates as their nonadaptive correspondents. 

In Section 4 we present results of computer simulations for a few quantum 
states, among which is the Schrodinger cat state which is expected to be 
produced in the lab in the future. Section 5 collects the proof of Proposition 1 
and a sketch of the proof of the adaptive upper bounds. 

Section 6 concentrates on the proof of the lower bound for the pointwise 
risk. For this we construct a pair of Wigner functions belonging to 

the class A((3, r, L) such that the distance between them is large enough 
and the y 2 distance between the likelihoods of the corresponding models is 
small. It is now a well-known lower-bounds principle that the best rate of 
estimation can be viewed as the largest distance between parameters in order 
to detect the change in the statistical model. This construction is original 
in the statistics literature as it relies on the positivity of the corresponding 
density matrices p\ and p2 rather than of the Wigner functions themselves. 

2. Physical background of quantum tomography. In this section we present 
a short introduction to quantum mechanics in as far as it is needed for un- 
derstanding the background of our statistical problem. The reader who is 
not interested in the physics can skip this section and continue with Sec- 
tion 3. In Section 2.2 we describe the measurement technique called quan- 
tum homodyne tomography and show how this can be used to estimate the 
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Wigner function which is a particular parametrization of the quantum state 
of a monochromatic pulse of light. More details on Wigner functions can be 
found in Section 2.3. The main issue tackled in this paper is the influence 
of noise due to the detection process on the estimation of the Wigner func- 
tion. The experimental setup of quantum homodyne tomography with noisy 
observations is discussed in Section 2.4. 

For more background material we refer to the textbook [21] on quantum 
optics and quantum tomography, the paper [2] which deals with the problem 
of quantum tomography from a statistical perspective, the review paper 
on quantum statistical inference [4] and the classic textbooks on quantum 
statistics [17] and [18]. 

2.1. Short excursion into quantum mechanics. Quantum mechanics is 
the theory which describes the physical phenomena taking place at the mi- 
croscopic level such as the emission and absorption of light by individual 
atoms, the detection of light photons. As a theory about physical reality, 
quantum mechanics makes predictions about the results of measurements 
performed in the lab. Such predictions are statistical in nature in the sense 
that in general we cannot infer the result of a measurement on a single 
quantum system but only the probability distribution of results of identical 
measurements performed on a statistical ensemble of identically prepared 
systems. Any such distribution is a function of the state in which the sys- 
tem is prepared, and of the performed measurement. Our statistical problem 
can then be briefly described as follows: estimate the state based on results 
of measurements on a number of identically prepared systems. 

Mathematically, the main concepts of quantum mechanics are formulated 
in the language of self-adjoint operators acting on Hilbert spaces. The reader 
who is not familiar with this theory may think of finite-dimensional Hilbert 
spaces C d , and d x d matrices as operators on C d . To every quantum system 
one can associate a complex Hilbert space TL with inner product (■, •) whose 
vectors represent the wave functions of the system or pure states, as we will 
see below. In general, a state is described by a density matrix, which is a 
compact operator p on TL having the following properties: 

1. Self-adjoint: p = p*, where p* is the adjoint of p. 

2. Positive: p > 0, or equivalently (ip,pip) > for all tp £ TL. 

3. Trace 1: Tr(p) = 1. 

The positivity property implies that all the eigenvalues of p are nonnegative, 
and by the trace property, they sum up to 1. The reader may have noticed 
that the above requirements are reminiscent of the properties of probability 
distributions, and this connection will be strengthened in a moment when 
we discuss the distribution of measurement results. 
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Before that we will take a look at the structure of the space of states 
on a given Hilbert space 7i. Clearly, the convex combination \p\ + (1 — 
A)p2 °f two density matrices p\ and p2 is a density matrix again and it 
corresponds to the state obtained as the result of randomly performing one 
of the two preparation procedures with probabilities A and, respectively, 
1 — A. The extremals of the convex set of states are called pure states and 
are represented by one-dimensional orthogonal projection operators. Indeed 
an arbitrary density matrix can be brought to the diagonal form 

dimH. 

i=i 

where Pj is the projection onto the one-dimensional space generated by the 
eigenvector e% S H of p and Aj > is the corresponding eigenvalue, that is, 
pei = \ei. 

The predictions made by quantum mechanics can be tested in the lab 
by performing measurements on quantum systems. We will now give the 
mathematical description of a measurement with space of outcomes given 
by the measure space (Q, S). If the system is prepared in the state p, then the 
result is random and has probability distribution P p over (Q, X) such that the 
map p i— > P p is affine, that is, it maps a convex combination of states into the 
corresponding convex combination of probability distributions. This can be 
naturally interpreted as saying that for any mixed state Xp± + (1 — X)p2, the 
distribution of the results will reflect the randomized preparation procedure. 

The most common measurement is that of an observable such as energy, 
position, spin, and so on. An observable is described by a self-adjoint opera- 
tor X = X* on the Hilbert space 7i and we suppose here for simplicity that 
it has a discrete spectrum, that is, it can be written in the diagonal form 

dimH 

(1) X = E X a P <» 

a=l 

with x a € M the eigenvalues of X, and P a one-dimensional projections onto 
the eigenvectors of X. The result of the measurement of the observable 
X will be denoted by X and is a random variable with values in the set 
SI = {x±,X2, . . .}. When the system is prepared in the state p, the result X 
has the distribution 

(2) F p [X = x a ]=Tr(P a p). 

Notice that the conditions defining the density matrices insure that P p is 
indeed a probability distribution. In particular, the expectation on X in the 
state p is 

dimW 

(3) E p [X] := J2 X ^ P [X = x a ] = Tr(Xp), 

a=l 
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and the characteristic function is given by 

(4) E p [exp(itX)]=Tr[exp(itX)p]. 

Measurements with continuous outcomes as well as outcomes in an arbitrary 
measure space can be described in a similar way by using the spectral theory 
of self-adjoint operators [18]. 

Suppose that a preparation procedure produces an unknown state p. It is 
clear that in general no individual measurement can completely determine 
the state but only gives us statistical information about ¥ p and thus indi- 
rectly about p. The problem of state estimation should then be considered 
in the context of measurements on a large number of systems which are 
identically prepared in the state p. Here we consider the simplest situation 
when we perform identical and independent measurements on each of the n 
systems separately. 

2.2. Quantum homodyne tomography and the Wigner function. The sta- 
tistical problem analyzed in this paper is that of estimating a function 
W p : M 2 ^ R from i.i.d. data (Yi, $i), . . . , (Y n , $ n ) with distribution ¥ r ' on 
Rx [0,7r]. In this subsection we will give an account of the physical origin 
of this problem. 

The quantum system is monochromatic light in a cavity, whose state is 
described by (infinite-dimensional) density matrices on the Hilbert space of 
complex- valued square integrable functions on the line L2(K). The function 
of interest W p is called the Wigner function and depends in a one-to-one 
fashion on the state p of the light. 

Two important observables of this quantum system are the electric and 
magnetic fields whose corresponding self-adjoint operators on are 
given by 

Qip(x) = xip(x) and, respectively, Pip(x) = —i-^-. 

The Wigner function W p : R 2 — ► R is much like a joint probability density 
for these quantities; for instance, its marginals along any direction cf) £ [0,7r] 
in the plane which are given by the Radon transform of W p , 

/oo 
W p (x cos <j> — t sin (f>, x sin <f> + t cos <fi) dt, 
-00 

are bona fide probability densities and correspond to the measurement of 
the quadrature observables := Qcos^ + Psin0. However, in quantum 
mechanics noncommuting observables such as Q and P cannot be measured 
simultaneously; thus we cannot speak of their joint probability distribution. 
This fact is reflected at the level of the Wigner function, which need not be 
positive; indeed, it might contain patches of "negative probability." 
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Thus, for a given quantum system prepared in state p we can measure 
only one of the quadratures for some phase (j) and we obtain a result 
with probability density p p (x\<p) = TZ[W p ](x, 4>). Let us consider now that we 
have n quantum systems prepared in the same state p and we measure the 
quadrature X^ on the ith system with phases $j chosen independently with 
uniform distribution on [0, 7r]. We obtain independent identically distributed 
results (Xi, 3>i), . . . , (X n , $ n ) with density p p (x,<j)) = p p (x\4>) with respect 
to the measure -A, where A is the Lebesgue measure on 1 x [0,7r]. The 
Radon transform 1Z: W p >— > pJx,(j)) is well known in statistics for its role 
in tomography problems such as positron emission tomography (PET) [28], 
and has a broad spectrum of other applications ranging from astronomy 
to geophysics [10]. In PET one estimates a probability density / on M? 
related to the tissue distribution in a cross section of the human body from 
i.i.d. observations (X%, . . . , (X n , <I> n ), with probability density equal to 
7Z[f]- The observations are obtained by recording events whereby pairs of 
photons emitted at the collision of a positron and an electron hit detectors 
placed in a ring around the body after flying in opposite directions along an 
axis determined by an angle (j) G [0, 7r]. The difference with our situation is 
that the role of the unknown distribution is played by the Wigner function, 
which as we mentioned is not necessarily positive in the usual sense but 
carries an intrinsic positivity constraint in the sense that it corresponds to 
a density matrix (see Section 2.3). Another difference with respect to PET 
is that in QHT the experimenter can decide how to choose the phases <&j. 
Indeed, in some experiments the phases are equidistant, that is, they take 
one of the values j-ir where I runs from to k — 1 for some fc£N, but one 
has now the additional problem of how to choose A; as a function of n. We 
believe that by using uniformly distributed phases one does not incur any 
loss in the asymptotic rates, but it remains an interesting open question 
whether a specially designed choice of phases can improve the results. This 
may be the case for some parametric classes of Wigner functions with an 
asymmetric aspect like those corresponding to squeezed states (see Section 
2.3). 

2.3. Properties of Wigner functions. The physics literature on Wigner 
functions and other types of "phase space functions" is vast, but a starting 
point for the interested reader may be the book [21]. Here we focus on the 
similarities and the differences with usual probability densities encountered 
in PET. 

Consider the space of Hilbert-Schmidt operators on L2(K), 

T 2 := {A G £(L 2 (K)) : ||A||1 = Tr(A* A) < oo}, 

on which there exists an inner product (A,B) 2 = Tr(A*B), and notice that 
the density matrices form a closed subset of 7^- The Wigner function W\ 
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is the image of A through the linear map W:T 2 -> L 2 (M 2 ) defined by the 
property that the Fourier transform T 2 with respect to both variables has 
the expression 

(6) W A (u,v) := F 2 [W A ](u,v) = Tr(Aexp(mQ + wP)). 

In particular, this defines the Wigner function W p of the state with density 
matrix p. By passing to the polar coordinates (u, v) = (tcos4>,tsm(p) we 
have uQ + vP = tX^, and using (4) together with the fact that p p {-\4>) is 
the density for measuring X^ we have 

(7) W p (u,v)=Tr(pexp(itX (j> ))=^ 1 [p p (-m(t), 

where the Fourier transform T\ in the last term is with respect to the first 
variable, keeping (f> fixed. The reader familiar with PET may recognize that 
the composition T 2 o T\ mapping p p into W p is just the inverse Radon 
transform [10], proving our assertion that QHT is about the tomography of 
the Wigner function. 

It can be shown that the map W:T 2 — >"L 2 {^ 2 ) is isometric up to a con- 
stant: 

(8) (A,B) 2 = 2vr(W A ,W B ) -=^JJ W A (q,p)W B (q,p)dqdp, 

and this fact is often used as a tool for calculating the expectation of an 
observable X 6 T 2 similarly to the way it is done in classical probability: 

(9) Tr(pX) = 2vr J J W^(q,p)W p (q,p) dqdp. 

Let us come back to our physical system, the light in a cavity, and con- 
sider its energy, which is given by the sum of intensities of the electric and 
magnetic fields H := ^(Q 2 + P 2 ). As predicted by Einstein before the cre- 
ation of quantum theory, the possible values that this observable may take 
are "quantized," which can be explained if we think of light as a packet 
of photons with each photon contributing a fixed quantum of energy. In- 
deed, by solving the eigenvalue problem we find Hipj = (j + l/2)ipj where 
{ipj}j>o is an orthonormal basis of IL-2(M) whose vectors have the physical 
interpretation of pure states with precisely j photons and are given by 



1 



(10) ^{x)= r ^= H j (x)e- x l\ 



where Hj(x) are the Hermite polynomials (see, e.g., [12]). 

Notably, the vacuum state corresponding to zero photons has nonzero 
energy 1/2, a purely quantum phenomenon called vacuum fluctuations re- 
flected in the fact that the distributions of Q and P are Gaussian with 
variance 1/2. We would like to stress here that the Gaussian distribution 
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emerges directly from physical principles and it is the same Gaussian char- 
acter of the vacuum which will lead to our model for the detection noise in 
Section 2.4. 

An interesting consequence of relation (9) is found by taking X to be the 
vacuum state P^ whose Wigner function is Wx.(q,p) = exp(— q 2 — p 2 )/^. 
Then, as the left-hand side of the equation is positive, this implies that 
the negative patches of W p around the origin must be balanced by positive 
ones in such a way that the integral remains positive. In fact this property 
holds for any point in the plane and the localized oscillations of the Wigner 
function are a signature of nonclassical states, such as states with a fixed 
number of photons or the so-called "Schrddinger cat states" like the one 
estimated in Figure 3. 

On the other hand, there exist probability densities that are not Wigner 
functions, for example, the latter cannot be too "peaked" (cf. [21]): 

(11) \W p (q,p)\ < - for all (q,p) G M 2 . 

7T 

A general density matrix p can be seen as an infinite-dimensional matrix 
with coefficients pjk = (tpj,ptpk) for j,k>0 such that J2k>o Pkk = 1 (trace 1), 
and [pjk] > (positive definite matrix). In particular, the diagonal elements 
Pk = Pkk represent the probability of measuring k photons for a system in 
state p. The density p p (x, 4>) is given in terms of the matrix elements of p 
by 

^ oo oo 

(12) p p (x,(j)) = - J2 PjkPjk{x,4>) ■= ~ J2 Pjk^j(x)^k(x)e- i{j - k),f> , 

71 j,k=0 11 j,k=0 

and a similar formula holds for the Wigner function W p (q,p) = J2j°k=o PjkWjk{q, 
p), with Wjk such that 1Z[Wjk] =Pjk- F° r an Y density matrices p,r (8) can 
be written 

\\W P -W T \\ 2 := f f \W p (q,p)-W T (q,p)\ 2 dpdq 

(13) 

1 1 

j,k=0 

Some examples of quantum states that can be created at this moment 
in the lab are given in Table 1 of [2]. Typically, the corresponding Wigner 
functions have a Gaussian tail but need not be positive. As a consequence of 
(11) not all two-dimensional Gaussian distributions are Wigner functions, 
but only those for which the determinant of the covariance matrix is at 
least j. Equality is obtained for a remarkable set of states called squeezed 
vacuum states having Wigner functions W(q,p) = ^exp(—e 2 ^q 2 — e~ 2 ^p 2 ), 
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determined by the squeezing factor £. More generally, the celebrated Heisen- 
berg uncertainty relation says that for any state p the noncommuting observ- 
ables P and Q cannot have probability distributions such that the product 
of their variances is smaller than |. 

2.4. Experimental setup and noisy observations. The optical setup sketched 
in Figure 1 consists of an additional laser of high intensity \z\ 2 S> 1 called a 
local oscillator, a beam splitter through which the cavity pulse prepared in 
state p is mixed with the laser, and two photodetectors each measuring one 
of the two beams and producing currents I12 proportional to the number 
of photons. An electronic device produces the result of the measurement by 
taking the difference of the two currents, integrating it over the time inter- 
val of the pulse, and rescaling it by a factor proportional to \z\ (see below). 
A detailed analysis taking into account various losses (mode mismatching, 
detection inefficiency) in the detection process can be found in [21]. It turns 
out that all these losses can be modeled by a Gaussian noise in the measure- 
ment results, and here we detail only the case of detection inefficiency. In 
the high photon number regime \z\ 2 ^> 1 the (integrated) current depends 
linearly on the intensity of the beam with a proportion 77 < 1 of the photons 
being detected. The process can be described classically by considering that 
each individual photon has probability 77 of being detected and 1 — 77 of being 
absorbed without detection. Thus in a beam of j photons the probability 
of detecting k < j is b 3 k {rj) = (j)rf k (l — Tjy~ k , and for an incoming state p 

we obtain the probability distribution of the results Pk(v) = Y^jLkPjjb 3 k (r)). 
This "photon lottery" can be equivalently described by replacing the re- 
alistic detector with an ideal one in front of which we place an imaginary 
beam splitter (see Figure 1) which has transmissivity t = and reflectivity 
r = y/1 — rj. 

In order to understand why this is the case and how the measurement 
noise appears, we will present two equivalent pictures of the action of the 
beam splitter stemming from the wave-particle duality typical in quantum 
mechanics. As shown in Figure 1 a beam splitter receives two incoming 
beams and has two outgoing beams as output. In the case of the imaginary 
beam splitter sitting in front of the detector, one of the incoming beams is 
the vacuum and let us assume that the beam to be measured has j photons. 
Then the joint state of the two beams is ipo <£> ipj £ L2(R) (8> L2OR) and the 
transformation to the outgoing vector is ipo®iJjj 1— > 2i=oK(^)] V'.7-fc ( 8 ) V , fc> 
which simply means that with probability 6^(77) we get k photons going to 
the ideal detector and j — k will not be detected, as described above. 

The second description is in terms of the transformation of the electric and 
magnetic field operators of the beams denoted by (Qz, Pz) and (Q r , P r ), with 
the first couple acting on the left side of the tensor product L2QR) ® IL<2(R) 
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Fig. 1. Quantum homodyne tomography measurement setup. 



and the second pair on the right side. The fields of the outgoing beams are 

= tQi — rQ r , Q' r = rQi + tQ r and similarly for P's. 

Then by computing the combined effects of the beam splitters, we have 
the fields arriving at the two detectors, Qi = ^[Q + Qio] — ?"Qivac and 
Q2 = 775 [Q — Qio] — fQ2 vac , and similarly for Pi, P2. We remind the reader 

that the number of photons in a beam is described by N := ^(Q 2 + P 2 — 1). 
Using the fact that in the limit \z\ 2 3> 1 the laser can be treated classically 

\z\ \z\ 

by replacing Qi Q by -^=cos^> and Pi G by ^sintp, we get 

Ni - N 2 = V2t\z\ [(tQt + rQJ ac ) + Odzp 1 )], 

with 0(| a term whose variance is bounded by C/\z\, and Q^ a 
quadrature operator of a vacuum mode accounting for the two fictitious 
beam splitters. Thus in the limit \z\ — > 00 the rescaled integrated current dif- 
ference I\ — 12/ y/2rj\z\ has the same distribution as iQ^ + ^Q™ , that is, that 
of the sum of two independent random variables Y := ^/rjX + — rj)/2£, 
where X ~ P (O (-|0) is the result of measuring X^, £ has the N(0, 1) law and 
—t=£ has the distribution of the quadrature in the vacuum (see Section 2.3). 



12 



C. BUTUCEA, M. GUTA AND L. ARTILES 



The efficiency-corrected probability density is then the convolution 



(14) J%M) = (*{!- v))~ 1/2 Pp ( 



dx. 



Finally, the constants \z\ and r\ are measured in advance as part of the 
calibration of the experiment and are considered to be known. 

3. Statistical procedure and results. For convenience we summarize now 
the statistical problem tackled in this paper. 

Consider (Xi, . . . , (X n , 3> n ), independent identically distributed ran- 
dom variables with values in R x [0, n] and distribution ¥ p having density 
p p (x,4>) with respect to -A, A being the Lebesgue measure on R x [0,7r], 
given by 

Pp(x,<j>) = K\W p ](x,<l>), 

where 1Z is the Radon transform defined in (5) and W p : R 2 — ► R is a so-called 
Wigner function which we want to estimate. The space of all possible Wigner 
functions is parametrized by infinite-dimensional matrices p = [pjk\^ k= Q such 
that Trp= 1 (trace 1) and p> (positive definite), in the way indicated 
by (6). Moreover, the correspondence between p and W p is one-to-one and 
isometric with respect to the L2 norms as in (13). The properties of Wigner 
functions have been discussed in Section 2.3, in particular the fact that W p 
may take negative values. 

What we observe are not the variables (Xg, &i) but the noisy ones (Yi, <E>i), 
...,(Y„, <£>„), where 

(15) Yt := y/rjX t + ^/(l-77)/2&, 

with £i a sequence of independent identically distributed standard Gaussians 
which are independent of all (Xj, $.,■). The parameter < r/ < 1 is known and 
we denote by the density of (Y^,$£) given by the convolution (14). The 
aim is to recover the Wigner function W p from the noisy observations. 

Class of Wigner functions. In order to apply the minimax estimation 
technology we will assume that the unknown Wigner function is infinitely 
differ entiable and belongs to the following class described via its Fourier 
transform: 

A(P,r,L) = (w p Wigner function: J \ W p {w)\ 2 e mw ^ dw < (2ir) 2 L 

where < r < 2, and (3,L > 0. From now on we denote by (•, •) and || • || 
the usual Euclidean scalar product and norm, while C(-) will denote posi- 
tive constants depending on parameters given in the parentheses. From the 
physical point of view the choice of a class of very smooth Wigner functions 
seems to be quite reasonable considering that to date no quantum state of 
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light has been constructed which does not satisfy such conditions. The rea- 
son for the difficulty in engineering states with less smooth Wigner functions 
is that the interactions needed to produce such states should be very non- 
linear in the electric and magnetic fields while it is known that photons are 
rather weakly interacting particles. For example, until recently the creation 
of squeezed states requiring a quadratic interaction was a not-trivial achieve- 
ment [5] . We mention here without proof the result of a computation showing 
that if a density matrix p satisfies the condition Tr(pexp[aN r//2 ]) < oo for 
some a, r > 0, then W p € A(0, r, L) for some f3, L > 0. In light of the previous 
argument we consider that this condition is actually rather weak. 

Estimation method. For the problem of estimating a probability density 
f:R 2 ->R directly from data (X e ,$ e ) with density K[f] we refer to the 
literature on X-ray tomography and PET, studied in [8, 19, 20, 28], and the 
references therein. In the context of tomography of bounded objects with 
noisy observations, Goldenshluger and Spokoiny [14] solved the problem 
of estimating the borders of the object (the support). For the problem of 
Wigner function estimation when no noise is present, we mention the parallel 
work [16]. 

Let N v denote the density of the rescaled noise — f?)/2£ and let N v be 
its Fourier transform. Denote by Pp(y,<p) the probability density of (Yg,$^) 
in (14). Then 

where p * q(y) = J p{y — x)q(x) dx denotes the convolution of p and q. Via a 
change of variable we can write p^,(y, (f>) as in (14). In the Fourier domain this 

relation becomes ^Fi\pl,(', $)](*) = F\\pp{'i 4>)](ty/v)N v (t), where T\ denotes 
the Fourier transform with respect to the first variable. 

In this paper we modify the usual tomography kernel in order to take 
into account the additive noise on the observations and construct a kernel 
that asymptotically performs both deconvolution and inverse Radon 
transformation on our data. Let us define the estimator 

1 n / Ye \ 

(16) Wl n (q, p) = -J2 K h[V cos ^ + P sin ^ " -7= > 

where < r/ < 1 is a fixed parameter, and the kernel is defined by 
K v (u) = J_ [ 1/h eM-iut)\t\ 
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and h > tends to when n — > oo in a proper way to be chosen later. 
For simplicity, let us denote z = (q,p) and [z, <j>\ = q cos (ft + psm(p; then the 
estimator can be written 



This is a one-step procedure for treating two successive inverse problems. 
The main difference with the no-noise problem treated by Gu^a and Artiles 
[16] is that the deconvolution is more difficult than inverse Radon transfor- 
mation, and thus the techniques for proving the optimality of the method 
(lower bound) are essentially different. Technically, the no-noise kernel-type 
estimator has dominating variance, while in the case of noisy observations 
the bias dominates the variance, as we will see later on. 

In Section 3.1 we analyze the mean squared error (MSE) at some fixed 
point. Our results concern minimax efficiency and adaptive optimality for 
this problem. We compute an upper bound for the convergence rate of the 
proposed estimator by minimizing the sum of upper bounds (uniform over 
the whole class) of the bias and of the variance. The optimality in rate of 
our estimator follows from the lower bounds, which are proved in Section 6. 
The meaning of the lower bounds results is that asymptotically, no other es- 
timation technique could outperform our method uniformly over all Wigner 
functions in the given class. Moreover, we prove the lower bounds, including 
the asymptotic constant (sharp minimax). 

We use a technique based on two hypotheses that appeared in [11] for 
periodic Sobolev classes and in [6] for classes of supersmooth functions, to 
which we refer for the details of some of the computations. We concentrate on 
the main construction involved in the lower bound, that is, the choice of two 
hypotheses belonging to the fixed class of Wigner functions such that their 
values in a fixed point are sufficiently different while their corresponding 
models have likelihoods close to each other. 

Despite the generality of a minimax sharp estimator, for practical pur- 
poses it is not obvious how to choose the smoothness parameters r and (5. 
Therefore, an adaptive method (i.e., free of prior knowledge of parameters 
P, r and L provided that they are in some set) is designed for classes with 
r < 1 in Section 3.2. They behave as well as the previous estimators, provided 
that we know maximal values of parameters. In particular, this estimator is 
optimal adaptive (i.e., adaptive and attaining the minimax rate) and effi- 
cient. We note that in general such procedures do not always exist. We are 
fortunate in our case and this is mainly due to the dominating bias. 




3.1. Pointwise estimation. In this section we give minimax and adaptive 
results for the pointwise risk (MSE) for the estimator Wj! in (16). The next 
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proposition contains upper bounds for the two components of the risk, the 
bias and variance, as functions of the parameter h and the number n of 
samples. The bounds are uniform over all Wigner functions in the class 
A(P,r,L). 

Proposition 1. Let (Y£,<$>£),£ = l,...,n, be i.i.d. data coming from the 
model (15) and let Wj! be an estimator (with h — > as n — > oo) of the 
underlying Wigner function W p belonging to the class A([3,r,L), with < 
r < 2. Then 

sup sup \E[Wl n (z)) -W p (z)\ 2 = ^exp(-f ) (l + o(l)), 
sup sup E[|^)-^y.)]| 2 ]<^-exp(g) (1 + 0(1)), 

z&? W p GA(/3,r,L) »7 n \h J 

where 7 = (1 — rj)/(Arj), and o(l) ^ as h — > and n — > 00. 

The pointwise convergence rate of Wfi n with h = h opt is then shown to 
be minimax by proving an additional lower bound. 



Theorem 1. Let (3 > 0, L > 0, < r < 2 and (Yi,$i),£ = l,...,n, be 
i.i.d. data coming from the model (15), and let Wj[ n be as defined in (16) 
with the kernel Kj^ of (17) and let the bandwidth h op t be given by the solution 
of 

(18) lL + ^ =Xogn . 

""opt "opt 

Then W^ n satisfies the following upper bounds in pointwise distance: 
hmsupsup sup E[\W^ n (z)-W p (z)\ 2 ]^- 2 <C, 

n ^°° zGR 2 Wp£A(/3,r,L) 

where the constant C and the pointwise rate are 

C=l, <Pn = - A a ex P ~TT- if0<r<2, 



4vr/?r V h r opt/ 

OO, (fl = n-^/^+T) ifr = 2. 

Moreover, the previous rate is minimax efficient for < r < 2 and nearly 
minimax for r = 2; that is, the following lower bounds hold: 



liminfinf sup K[\W n (z) - W p (z)f]<p~ 2 > 1 Vz G R </ < r < 2, 

n ~*°° W n W p eA(/3,r,L) 
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liminf inf sup E[|W n (z) - W p {z)\ 2 }{n\ognf > c > 

VzGl?/r = 2, 



3 W n W p eA{j3,2,L) 



where inf ^ is taken over all possible estimators W n of the Wigner function 
W p . 

Proof. The proof of the lower bounds is given in Section 6. 
Sketch of proof of the upper bounds. By Proposition 1 we write 

sup sup K[\WZ n (z)-W p (z)\ 2 ]<C B h r - 2 e W (-^) + ^e W (^), 

where Cb and Cy denote the constant terms, depending on f3,r,L and rj. 
We select the best bandwidth as h opt = arginf/ t> o{Cs/i r_2 exp(— 2(3 /h r ) + 
Cy/nexp(27//i 2 )}. By taking derivatives we get 

%- + = logn + C(l + o(l)) as n ^ oo, 

where C > depends on (3, r, L and rj. This allows us to take h opt as in (18) 
and check that up to constants 

^opt 2 exp(-^-) = K~? ■ ^xp(^g-) (1 + o(l)) ~ K~? Var(^ optjn (z)), 

that is, the bias term is asymptotically larger than the variance term, for all 
< r < 2, and they are of the same order if r = 2. □ 

Remarks on bandwidths and rates. The bandwidth (18) and con- 
sequently the rates are given in an implicit form. We show now that more 
explicit expressions can be obtained, if we restrict to values of r in certain 
intervals. 

If r < 1, then it suffices to take bandwidth 



hi 



logn (3 / log 



n ^ r / 2 \ -1/2 



and the bias term is larger than the variance term (for h = h\) and of the 
same order as Lp 2 n (for h = h op t): 

L (lognY^/ 2 ( /lognV/ 2 

exp [-2/3 — — +o(l) 



Airfir V 27 / V V 27 

If 1 < r < 4/3, then we take h 2 = - ^h\ r y x l 2 and we get the risk 
bound (for h = h<i) of the same order as ip\ (for h = h op t). 

exp [-20 — — +Ci(r,/3,7) — — - o(l. 



4vr/3r V 2 7 / "V V 2 7 J v " " \ 2 7 
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In general one has to consider separately the cases (k — l)/k<r/2<k/(k + 
!)• 

We deal with a composition of two ill-posed inverse problems with the 
deconvolution being the dominating factor and the inverse Radon transfor- 
mation bringing corrections to the usual rates. For r = 1 we can compare 
our result with that of [16] for the idealized tomography model without 
noise. While the latter is almost parametric, in the presence of deconvo- 
lution the rates decrease to a factor ydog n exp ( — c-^/log n ) , which is faster 
than (logn) _a but slower than power n~ a rates, for any a > 0. Compared 
with the density estimation in the convolution model of [6], we get an ad- 
ditional logarithmic factor h~^ t /2 in the rates due to the presence of the 
inverse Radon transformation. However, as we will see later, an important 
difference with [6] is the proof of the lower bound requiring the construction 
of a "most difficult" family of Wigner functions. 

3.2. Optimal adaptive estimation. In the previous theorem the kernel 
estimator WV has a bandwidth h = /i op t which is the solution of (18) de- 
pending on the parameters (3 and r of the class. In the next theorem we will 
show that there exists an adaptive estimator, that is, not depending on the 
parameters, performing as well as the former estimators, provided that they 
lie in the set B = {(J3, r, L) : (3 > 0, < r < 1, L > 0}. 

Theorem 2. Let (Yf, = 1, . . . ,n, be i.i.d. data coming from the 
model (15). ThenWj[ n with h = /i a< j, 



ad 



2rj log n 2rj log n 
1 — T) y 1 — T] 



-1/2 



is an optimal adaptive estimator over the set of parameters B. That is, the 
estimator attains the same upper bounds, for all (/3,r, L) G B, 

limsup sup E[|W£ (z) - W p (z)\ 2 ]tp~ 2 < 1 VzGl 2 , 

n->oo W p £A(l3,r,L) ad ' 

where the rate (p~ 2 is given in Theorem 1 for the case < r < 1. 

For the proof of this theorem we refer to a similar result of [6] . Note that 
ft, a d is a fixed quantity and does not depend on the data. An important 
consequence is that in conjunction with the lower bounds in Theorem 1, 
the estimator WV is optimal adaptive and efficient over the set B for the 
pointwise risk. This means it attains the minimax rate and the constant 
C = 1 for an estimator free of (3, r and L provided that these parameters are 
in the class B. 
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4. Practical implementation. We study three Wigner functions, each one 
belonging to some class A(f3, 2, L) with arbitrary (3 < 1/4. The one- and two- 
photon states are described by diagonal density matrices with pjj = 5j t \ and, 
respectively, pjj = Sj^, and can be readily produced in the lab. The third 
state is a so-called Schrodinger cat state which is represented by the sum 
of two vectors corresponding to laser states, and which may be available 
experimentally in the near future. 

For the one-photon state, we simulated n = 5000 noisy data (Yg, 3>^) by 
first generating (Xi, &i) having density p p (x, cp) and then adding the noise by 
using standard Gaussians ^ and detection efficiency r] = 0.9. We calculated 
the estimator W^ n with optimal bandwidth h opt = (logra/ ' (2(3+2^))~ l l 2 . We 
then reconsidered the kernel function and localized it by using a modified 
kernel having Fourier transform 




1 2 3 4 S 6 7 e 9 12 3 4 5 6 7 8 9 1234 56789 

Colunwi Number Column Number Column Nwnbe' 



Fig. 2. Left: One-photon state, n = 0.9, n = 5000. Middle: Same data, modified kernel. 
Right: Two-photon state, n = 0.95, n= 10,000. 
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x(;( W <l) + exp(^-- w i— y )/Q< W <|)). 

The function K lri is an infinitely differentiable function (much smoother than 
K 11 )] thus K lr> decays exponentially fast. In Figure 2 we plot a transversal cut 
corresponding to the line p = 0, passing through the most difficult point to 
estimate (0, 0), in which the error is dominated by the bias. The true Wigner 
function is plotted with a continuous line and the dashed line represents an 
estimator for one sample of size n. The graphics on the left-hand side concern 
the one-photon state with the original kernel estimator while the graphics 
in the middle show the estimator with the modified kernel (19) at the same 
bandwidth. An important improvement can be noticed in the case of the 
kernel K' v . The left column concerns the two-photon state with modified 
kernel. The pointwise loss was then computed for ten samples (each of size 
n = 5000) at points (0, 0), (0, ±0.5), (0, ±1), (0, ±1.5) and (0,±2) and the 
corresponding boxplots are shown in the lower panels of Figure 2. We notice 
that the highest losses are indeed observed at (0, 0) and that the losses 
are quite stable from one sample to another. In the case of the Wigner 




-4 -5 ~2 -1 1 2 3 4 



Fig. 3. Contour plot of the estimated Wigner function for the Schrodinger cat state. 
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Fig. 4. Transversal cuts through the Wigner function for the Schrodinger cat state. 
Top: Estimated Wigner function (dashed line), rj — 0.95, n = 500,000. Bottom/ Estimated 
Wigner function (dashed line) and estimator without deconvolution (dash-dotted line), 
7\ = 0.85, atn = 500,000. 

function of the Schrodinger cat state we considered samples larger than 
10,000 data which we binned in a 100 x 100 histogram. Figure 3 shows a 
contour plot of our estimator W^ n for a sample of size n = 500,000 and rj = 
0.95. Characteristic features are clearly visible: two Gaussian-shaped domes 
on the sides with positive (thin lines) and negative (thick lines) oscillations in 
the center. A similar estimator has been computed for rj = 0.85 and Figure 4 
shows different cuts through these estimators (dashed lines) compared with 
the true Wigner function (continuous line). The relatively worse performance 
in the case rj = 0.85 is confirmed by Table 1 which gives the mean square 
errors over 100 samples of size n at different peaked or flat points (q,p) 
of the Wigner function and for the two different noise levels, rj = 0.95 and 
rj = 0.85. Tomographic reconstruction with real data was considered in [5]. 
However, in this reference no Gaussian deconvolution is performed. Thus one 
actually estimates a convoluted Wigner function WJj = 1Z~ l [p r, p ] with usual 
parametric rate within logn factors [8]. We have tested such an estimator 
for the case of the Schrodinger cat state with n = 500,000 and rj = 0.85 and 
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cuts through the Wigner function. A result we obtained is the dash-dotted 
line shown in the panels in the lower part of Figure 4. This cut can be 
compared with the dashed line representing our estimator, which performs 
both inverse Radon transformation and deconvolution. 



5. Proofs of upper bounds. 



Proof of Proposition 1. Since our data are i.i.d., we write 
nwUz)} = IT f Kldz^-y/^Jy^dyd^ 



vr Jo 

= \ [ K l * (Vvp v P (-Vv,miz, </>}) d4>. 

Now, write the convolution in the integral as an inverse Fourier transform. 
Indeed, it has Fourier transform [see (17)] 

^♦(^(•v^^))](*) = ^(t)^i^(->)](*/v^3 

= \\t\r l [p p {-A)]{t)I{\t\<l/h). 

Replace this into the expected value of our estimator and use (7): 

Efefz^ At r e- u ^Mt\W p (t cos (f>,tsm(j))dtd<i> 
4tt z Jo J-i/h 

(20) = 4^/ / e ~ i(qU+PV) W P (u,v)I{Vu 2 + v 2 < 1/h) dudv 

= 4^2 / e- l ^ w) W p (w)I(\\w\\ < l/h)dw, 
where we denote w = (u,v). We recall that we also have 



Table 1 

Schrodinger cat state: MSE x 10 for 100 samples of size n at points 
(<7iP) for r] — 0.95 (left side) and r\ = 0.85 (right side) 



(q,p) : n 10,000 100,000 500,000 10,000 100,000 500,000 



(0,0) 


507 


173 


119 


1224 


330 


229 


(0,3) 


54 


10 


4.16 


428 


161 


67.9 


(0,2.5) 


56.9 


14.1 


4.5 


361 


181 


67.7 


(0.5,0) 


414 


113 


70.1 


909 


258 


164 


(3,0) 


29.7 


7.09 


1.66 


225 


94.6 


31.1 
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and then we write for the pointwise bias of our estimator, 



|E[H^J(*)-W,(z)| a = ^ 



-i(z,w) 



{F[E[Wl n ]]{w)-W p {w)}dw 



< 



(4vr 



2\2 



\W p (w)\ 2 e 2 ^ r dw 



-m\M\ r dw 



\\w\\>l/h 

< L^ g _ 2/3/fe , (i + o(i)) ag ^^ 0) 



47r/3r 

by the assumption on our class. As for the variance of our estimator, 
V[WUz)) =n\WUz) -E[WUz)]\ 2 ] 



(21) 



< — E 
n 

~ njo 



(ffJ(M]-s/ViB)V?(!/,«<iy#. 



At this point, let us denote 

G(t) :=F[KZ([z,4>] - -/y^Ki) = ^^Kl{-t^). 
Replace in (21) by taking into account that for a probability density i 



we have |^i[p2('j 



<i, 



7T I 
2^ 



G*G(t)F 1 \p*(;<!>)](t)dt 

2 



< 



\t\<i/(h^) Nv(t) 



dt 



Finally we obtain 



(22) E 



<- 2 \2<l 



vcv**) t 



- exp t 



1-rj 



dt 



Let us note here that, more generally, for any positive a, s and for any i£t, 
we can use integration by parts to get the asymptotic evaluation 

(23) [ X t A exp{at s ) dt = — x A+l ~ s exp(ax s ){l + o(l)) asx^oo. 
Jo as 

We use formula (23) for the integral in (22) as 1/h — > oo, and with (21) we 
get 

,2 



V[Wl n (z) 



< 



2if 
(1 — rj) 2 n 



exp 



1-r? 1 
2r) h? 



n — ► oo. 
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□ 



Proof of Theorem 2. Over B we have 

n\WlJz) - W p (z)\ 2 ] < J^.( had y- 2 e W ( 



2rj 2 fl-i) 
+ T, 4t~ ex P 



(1- V 2 )n ^V2 ?? (/ lad )V' 
and it is easy to check that, for (j3,r,L) £ B, 



M^^) = exp (~v^ log ") = ° <1)exp (-C)^ 

Thus, n attains precisely the rate ip^ (C = 1). □ 

6. Proof of lower bounds. In this section we will construct a pair of 
Wigner functions W\ and W<i depending on a parameter h such that h — ► 
as n — > oo. The choice of /i [see (31)] is such that it insures the existence 
of the lower bound in Theorem 1, and it should not be confused with the 
window h appearing in the expression of the estimator which is optimal with 
respect to the upper bounds. We choose W\ and W 2 of the forms 

W 1 (z) = W (z) + V h (z) and W 2 (z) = W (z) -V~ h (z), 

where Wo is a fixed Wigner function corresponding to the density matrix pq. 
The function Vz is not a Wigner function of a density matrix but belongs to 
the linear span of the space of Wigner functions and thus has a corresponding 
matrix r h in the linear span of density matrices. The choice of Wo, V% is such 
that 



Pi 



p Q + t h and p 2 = p - r h 



are density matrices (positive and trace equal to 1) with Radon transforms 
pi and p2- Suppose that the following conditions are satisfied: 

(24) Wi and W 2 belong to the class A(/3,r,L), 



(25) \W 2 (z)-W 1 (z)\>2<p n (l + o(l)) asrwoo 

(26) 



2 r f (pl M )-pi M 2 m 

nx / v dyd<f) = o{l) 

Jo J p{{yA) 



as n — > oo. 
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Then we reduce the minimax risk to these two functions, W\ and W2, and 
bound the max from below by the mean of the two risks, to get for some 
0<r < 1, 

inf sup E[\W n (z)-W p (z)\ 2 } 

W n W p £A(/3,r,L) 



> 



inf — 

w n 2 



E pl [\W n {z)-W 1 {z)\] 



+ (l-r)E 



pi 



> 



(1 



ei > 1 



W n {z) - W 2 (z) 
(l + o(l)). 



We use the triangle inequality to get rid of the estimator and (25). Following 
Lemma 4 in [6], we know that the last probability in the display above 
is bounded from below by 1 — r 2 provided that nx 2 < t 4 . It is therefore 
sufficient to check (26), in order to find r n — > 0, as n — > 00 and give a lower 
bound of the minimax risk of order ^(1 + o(l)), for any estimator W n . 

We construct first the functions Wi 2 and then prove (24)-(26) in Sec- 
tion 6.3. Note that for the case r = 2 we prove a weaker form of (26): 
nx 2 = 0(1) as n — ► 00. The same reasoning as above shows that c/) 2 is then 
the optimal rate up to some constant (depending on some fixed r). 



6.1. Construction of the density matrix pq. In this section we will con- 
struct a family of density matrices p a ^ from which we will later select 
po = p a °'£° used in the lower bound. We derive their asymptotic behavior in 
Lemmas 1 and 2, and we show that Wj belongs to the class A((3, r, L) for 
a > small enough and £ close to 1 . 

Let us consider the Mehler formula (see [12], 10.13.22) 



1 



( a l-* 



(27) Y z k — , H k (x) 2 e- x = , „ exp [ -x 2 , 

to V^k\2 k kK ' 0r(l-^) P V 1 + 

where are the Hermite polynomials. Integrating both terms with /J (2) 
a((l - z)/(l - £)) a I(£ < z< 1), for some < a,£ < 1, we get 



00 ! 

pKx,^^^^^) 2 / 

,t^, Jo 



(28) 



fc=0 

Pl _/S(*) 

'0 



QHT WITH NOISY DATA 25 

where ipk are the orthonormal vectors defined in (10). The Fourier transform 
of p| is 

(29) wj( w ) = ^ 1 [p|](|| t ,||^ )= | o 1 ^exp(-||^|| 2 J i ±^)^. 

Notice that the normalization condition / p| = 1 is equivalent to Wj(0) = 1, 
which is satisfied for the chosen functions /|, and thus p| is a probability 
density. From the first equality in (28) we deduce that p| is the probability 
density corresponding to a diagonal density matrix p a ^ with elements pt't = 
SoZ k fi{z)dz. We look now at the behavior of p^(x,<p) with respect to x. 

Lemma 1. For all < a,£ < 1 and \x\ > 1 there exist constants c,C 
depending on a and £, such that c|x|~( 1+2a ) <p a (x,(p) < C\x\~( 1+2a \ 



Proof. We have 



(1-Z) a -V2 / 2 1_ Z 



which by the change of variables u = xJ jxf becomes 



By denoting g(u) = u 2a exp(— u 2 ), the last integral is bounded for |x| > 1 as 
follows: 



(l-O a V^\x\ 2a+1 



g{u) du 



ft \ a2 a+1 f°° . , , 

A similar analysis can be done for the matrix elements of p a . In the 
particular case a = 1 and £ = we have pj^ = ( fc+1 ) 1 ( fc+2 ) • 

Lemma 2. For all < a,£ < 1 toe /iai>e 

= (I^ r(a + l)*" (14a) (l + °«) « " - oo. 

PROOF. We notice that by definition of p?'f and the property 



o 



T{2 + a + k) ' 
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Q 



_a r(l + g)r(l + fe) 

Pk,k (l_f)a T(2 + a + k) 



< 



(WWo 
(1-0°' 



i z k {l-z) a dz 



Now, using Stirling's approximation for the function V (see [1], 6.1.47) 
we deduce that T(l + k)/T{2 + a + k) = /H 1+a )(l + o(l)) and, given that 
k 1+a ^ k+1 = o(l), we obtain the desired result. □ 

Lemma 3. For any (j3,r,L) such that < r < 2, there exist < a,£ < 1 
such that Wj belongs to the class A((3,r,L). 

Proof. Using (29) we get 

e W\M r \wi{w)\ 2 dw 



te 2 ^ ( / 1 exp f-t 2 4+^ cfe) 2 A 



o \J 1-z \ 4(1 -z 

" te 2 ^" ( f 1 (1 - z)^ 1 exp f - — ^— + l)dz) 2 dt 



(l-0 2a Jo \Jt ' ' V 2(1- z) 4 

< t exp (2fff- - ^^1 ) A < C(/J, r,f ), 

where C(f3,r,£) > can be made smaller than (2ir) 2 L for any < r < 2 and 
for < £ < 1 close enough to 1 . □ 

6.2. Construction of and asymptotic properties of p h . Let be the 
function defined on M 2 whose Fourier transform is 

^ 2 [^]H=V-H:=^(t) 



2V^h 1 ~ r / 2 e^' hr e- 2 ^ r j(\t\ r - i) ; 



where i = ||u>||, and J is a three-times continuously differentiable function 
with bounded derivatives and such that I[28,d-28]( u ) < J( u ) < ^[<5,D-5]( M )) 
for some 5 > and D > 45. The choice of the function V% is motivated for 
the case < r < 2 by the results on lower bounds for deconvolution obtained 
in [6] . The parameter h — > as n — ► oo is solution of the equation 

(31) ^ + ^=logn + (loglogn) 2 . 
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When r = 2, we choose 

- _ ^ log(nlogra) ^ 1 / 2 
{S ) \ 2C9 + 7) J ■ 

We think of Vr as a function belonging to the linear span of the Wigner 
functions. Indeed, as shown in (13), the convex map sending a density matrix 
p to its corresponding Wigner function W p can be extended by linearity to 
an isometry (up to a constant) with respect to the || • ||2 norm on the two 
spaces. We can thus construct a matrix r h belonging to the linear span of 
the space of density matrices and whose corresponding Wigner function is 
V^. Because the function is invariant under rotations in the plane, the 
corresponding matrix has all off-diagonal elements equal to and for the 
diagonal ones we can use the formula (from [21]) 

(33) r fc \ = 47r 2 / L fc (t 2 /2)e"* / 4 tJ~ h (t)dt, 

J 

where L k are the Laguerre polynomials defined in the proof of the following 
lemma. 



Lemma 4. The matrix r has the asymptotic behavior 
(34) ri = 0(k-^)o h (l). 

Proof. We use the differential equation of the Laguerre polynomials 
(see [15], 8.979), L k (x) = \{{x - l)L' k (x) - xL'^x)). Thus 

j t L k (t 2 /2) = tL' k (t 2 /2) and ^L k (t 2 /2) = L' k (t"/2) + t 2 L£(i 2 /2), 
which implies 

and 

Lk{t 2 /2) = ^ (V - l)r l j t L k {t 2 /2) - ^L k (t 2 /2)^j . 
Using integration by parts we obtain 

T * k= kl Lk{t2/2)e ^ /4 [ft(<WO + ft(*)J&(*) + ft(<)J&(*)]«a, 

with Pi(t) polynomials of degree at most 3, whose coefficients do not depend 
on /i or fc. As the support of the function under the integral is contained 
in the interval [1/h, 00), we can use the following bound for the behavior of 
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Laguerre polynomials (see [27], Theorem 8.9.12): sup,j. 6 [ lj00 ) e x l 2 \Lk(x)\ = 
0{k~ l / A ). The matrix r h has thus the asymptotic behavior 

POO 

rL<Ck-^ / _\P 1 (t)J Ji (t) + P 2 (t)j' h (t) + P 3 (t)4(t)\dt 

Jl/h 

= 0(k-^)o~ h (l). □ 

6.3. Proofs of (24) -(26) involved in the lower bound. Lemma 3 implies 
that for £ sufficiently close to 1, the Wigner function belongs to the 
class r, a 2 L). On the other hand, combining the results of Lemma 2 and 
Lemma 4 we get that for any a < 1/4 the diagonal matrices pi = p a ^ + r h 
and p2 = p a ^ — T h are positive and have trace 1 for h sufficiently small. Thus 
there exist aoj£o sucn that the corresponding p\ and p2 are density matrices 
and W = W|o eA(P,r,a 2 L). 

In the following proofs the constants 5 and D appear from the construc- 
tion of V%. The whole proof holds for arbitrarily small 5 > and arbitrarily 
large D > 45, hence the desired results. 

Proof of (24). By the triangle inequality 

The first term in the sum above is less than 2ir^/La. For the second one we 
have 

\T 2 [V } }(w)\ 2 e 2 ^ r dw 

= Jo I '* " 3=2 ^ ^ C ° S ^ 1 ^ 1 2e2/3|<l? ' dt d ^ 
t\\J h (t)\ 2 e 2 ^ r dt 

<47r 2 prLh 2 ~ r e 2 ^' hr [ \t\e~ 2 ^ r dt 

J8<\t\ r ~l/h r <D~8 



■ IT 



<47r 2 Le- 2 ^. 



Thus, if we take a = 1 - e -/M / 2 , we get W 1)2 in the class A{(3, r,L(l- e'^l 2 + 
e' 138 )) included in A(f3, r,L). □ 

PROOF of (25). Notice that |W 2 (2;) - W^i(z)| 2 is equal to 



1 

47T 2 7r2 



2 

3 i ^ w \W 2 (w)-W 1 ( y w))dw 



1 

4^ 



1 

2^2 
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-it z.i 



JO 



2-7T /"OO 



JO 



-it 2,1 



|i|(W2(tcos^,tsin^>) 

— Wi (£ cos 0, i sin (f>))dtt 
\t\J h (t)dtd(p 



Take z = without loss of generality: 

|W 2 (z)-Wi(*)| 2 

' 't\J~ h (t)dt 



2vr 7 

> 4vr/3rL/ l 2 - r e 2/3/ ' 1 
L 



1 



>4- 



47r/3r 



2lT J2S<\t\ r -l/h r <D-25 
h r - 2 e-W~V [e -4^ (1 + o(1)) _ e -2^(D-25) (1 + o(1))]2) 



which is larger than 4(/? 2 [e _4/3<5 — e _2 ^ D_2<5 ' l ] 2 (l + o(l)) for n large enough. 
Note that for < r < 2, the h solution of (31) provides exact lower bounds, 
while for r = 2, h given by (32) provides optimal rates of order (nlogn)~^^ /3+ ' y \ 
which are within a logarithmic factor optimal. □ 

Proof of (26). We want to bound from above nx 2 < ttu j{pl{y) — 
Pi(y)) 2 /Pi(y) dy. We have proven that pi(x) > Cx~ 2 for all \x\ > 1. It is 
easy to prove that after convolution with the Gaussian density of the noise 
the asymptotic decay cannot be faster; thus p\(y) > p-,V|y| > M, for some 
fixed M > 0. Then we split the integration domain into \y\ < M and \y\ > M 
and get 

(35) n X 2 <Cn(c{M)\\pl-pl\\ 2 + [ y 2 (pg(y) - p\{y)) 2 dy 
V J\y\>M 

Let us see first that 

^\\ 2 = C [ \Jr(t)\ 2 e~^ t2 l^dt 



9 2 ~ Pi I 



(36) 



Then 



< Ch l ~ r exp 



(2f3 



-A(3t r -(l- V )t 2 /(2 V ) dt 



<Ch 2 ~ r exp 



y 2 {P2(y) -P\{y)Ydy 



(l+8h r y/ r /h 

2(3 

h r 2rjh 2 J ' 



\>M 
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(37) 




t 2 e -4f3t r -(l~r,)t 2 



1^ dt 



For the case < r < 2 choose h as solution of (31) to get that the expressions 
in (36) and (37) tend to 0, and together with (35) this concludes the proof of 
(24). For the case r = 2, h given by (32), we get that the expression in (36) 
tends to and (37) stays bounded as n — > oo; thus we obtain the desired 
result. □ 
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